MCN: Modulated Convolutional Network
41
creased. In particular, to alleviate the disturbance caused by the binarized process, a center
loss is designed to incorporate the intraclass compactness with the quantization loss and
filter loss. The red arrows are used to show the back-propagation process. By considering
filter loss, center loss, and softmax loss in a unified framework, we achieve much better
performance than state-of-the-art binarized models. Most importantly, our MCNs model
is highly compressed and performs similarly to the well-known full-precision Resnets and
WideResnets.
As shown in Fig. 3.1, M-Filters and weights can be jointly optimized end-to-end, resulting
in a compact and portable learning architecture. Due to the low model complexity, such an
architecture is less prone to overfitting and is suitable for resource-constrained environments.
Specifically, our MCNs reduce the required storage space of a full-precision model by a
factor of 32 while achieving the best performance compared to the existing binarized filter-
based CNNs, even approximating full-precision filters. In addition, the number of model
parameters to be optimized is significantly reduced, thus generating a computationally
efficient CNNs model.
3.4.1
Forward Propagation with Modulation
We first elaborate on the MCNs as vanilla BNNs with only binarized weight. We design
specific convolutional filters used in our MCNs. We deploy the 3D filter across all layers of
size K × W × W (one filter), which has K planes, and each of the planes is a W × W-sized
2D filter. To use such filters, we extend the input channels of the network, e.g., from RGB
to RRRR or (RGB+X) with K = 4 and X denotes any channel. Note that we only use
one channel of gray-level images. Doing so allows us to implement our MCNs in existing
deep-learning platforms quickly. After this extension, we directly deploy our filters in the
convolution process, whose details concerning the MCNs convolution are illustrated in Fig.
3.2(b).
To reconstruct unbinarized filters, we introduce a modulated process based on M-Filters
and binarized filters. An M-Filter is a matrix that serves as the weight of binarized filters,
which is also the size of K × W × W. Let Mj be the j-th plane of an M-Filter. We define
the operation ⊚for a given layer as follows:
ˆCi ◦M =
K
j
ˆCi ∗M
′
j,
(3.12)
where M
′
j = (Mj, ..., Mj) is a 3D matrix built based on K copies of the 2D matrix Mj with
j = 1, ..., K. ∗is the element-wise multiplication operator, also termed the Schur product
operation. In Eq. 3.12, M is a learned weight matrix used to reconstruct convolutional filters
Ci based on ˆCi and the operation ◦. And it leads to the filter loss in Eq. 3.18. An example
of filter modulation is shown in Fig. 3.2(a). In addition, the operation ◦results in a new
matrix (named reconstructed filter), i.e., ˆCi ∗M
′
j, which is elaborated in the following. We
define:
Qij = ˆCi ∗M
′
j,
(3.13)
Qi = {Qi1, ..., QiK}.
(3.14)
In testing, Qi is not predefined but is calculated based on Eq. 3.13. An example is shown
in Fig. 3.2(a). Qi is introduced to approximate the unbinarized filters wi to alleviate the
information loss problem caused by the binarized process. In addition, we further require
M ≥0 to simplify the reconstructed process.